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Abstract 

When dealing with incomplete data in statistical learning, or incomplete observations 
in probabilistic inference, one needs to distinguish the fact that a certain event is observed 
from the fact that the observed event has happened. Since the modeling and computational 
complexities entailed by maintaining this proper distinction are often prohibitive, one asks 
for conditions under which it can be safely ignored. Such conditions are given by the missing 
at random (mar) and coarsened at random (car) assumptions. In this paper we provide 
an in-depth analysis of several questions relating to mar/car assumptions. Main purpose 
of our study is to provide criteria by which one may evaluate whether a car assumption 
is reasonable for a particular data collecting or observational process. This question is 
complicated by the fact that several distinct versions of mar/car assumptions exist. We 
therefore first provide an overview over these different versions, in which we highlight the 
distinction between distributional and coarsening variable induced versions. We show that 
distributional versions are less restrictive and sufHcient for most applications. We then 
address from two different perspectives the question of when the mar /car assumption is 
warranted. First we provide a "static" analysis that characterizes the admissibility of the 
car assumption in terms of the support structure of the joint probability distribution of 
complete data and incomplete observations. Here we obtain an equivalence characterization 
that improves and extends a recent result by Griinwald and Halpcrn. We then turn to a 
"procedural" analysis that characterizes the admissibility of the car assumption in terms 
of procedural models for the actual data (or observation) generating process. The main 
result of this analysis is that the stronger coarsened completely at random (ccar) condition 
is arguably the most reasonable assumption, as it alone corresponds to data coarsening 
procedures that satisfy a natural robustness property. 



1. Introduction 

Probabilistic models have become the preeminent tool for reasoning under uncertainty in 

AI. A probabilistic model consists of a state space W, and a probability distribution over 
the states x € W. A given probabilistic model is used for probabilistic inference based on 
observations. An observation determines a subset U ofW that the true state now is known 
to belong to. Probabilities then are updated by conditioning on U. 

The required probabilistic models are often learned from empirical data using statistical 
parameter estimation techniques. The data can consist of sampled exact states from W, 
but more often it consists of incomplete observations, which only establish that the exact 
data point x belongs to a subset U C W. Both when learning a probabilistic model, and 
when using it for probabilistic inference, one should, in principle, distinguish the event that 
a certain observation U has been made ("[/ is observed") from the event that the true 
state of is a member of U ( "t/ has occurred" ) . Ignoring this distinction in probabilistic 



(c)2005 AI Access Foundation. All rights reserved. 



Jaeger 



inference can lead to flawed probability assignments by conditioning. Illustrations for this 
are given by well-known probability puzzles like the Monty-Hall problem or the three pris- 
oners paradox. Ignoring this distinction in statistical learning can lead to the construction 
of models that do not fit the true distribution on W. In spite of these known difficulties, 
one usually tries to avoid the extra complexity incurred by making the proper distinction 
between "t/ is observed" and "U has occurred" . In statistics there exists a sizable literature 
on "ignorability" conditions that permit learning procedures to ignore this distinction. In 
the AI literature dealing with probabilistic inference this topic has received rather scant at- 
tention, though it has been realized early on (Shafer, 1985; Pearl, 1988). Recently, however, 
Griinwald and Halpern (2003) have provided a more in-depth analysis of ignorability from 
a probabilistic inference point of view. 

The ignorability conditions required for learning and inference have basically the same 
mathematical form, which is expressed in the missing at random (mar) or coarsened at 
random (car) conditions. In this paper we investigate several questions relating to these 
formal conditions. The central theme of this investigation is to provide a deeper insight into 
what makes an observational process satisfy, or violate, the coarsened at random condition. 
This question is studied from two different angles: first (Section 3) we identify qualita- 
tive properties of the joint distribution of true states and observations that make the car 
assumption feasible at all. The qualitative properties we here consider are constraints on 
what states and observations have nonzero probabilities. This directly extends the work of 
Griinwald and Halpern (2003) (henceforth also referred to as GH). In fact, our main result 
in Section 3 is an extension and improvement over one of the main results in GH. Secondly 
(Section 4), we investigate general types of observational procedures that will lead to car 
observations. This, again, directly extends some of the material in GH, as well as earlier 
work by Gill, van der Laan & Robins (1997) (henceforth also referred to as GvLR). We 
develop a formal framework that allows us to analyze previous and new types of procedural 
models in a unified and systematic way. In particular, this framework allows us to specify 
precise conditions for what makes certain types of observational processes "natural" or "rea- 
sonable" . The somewhat surprising result of this analysis is that the arguably most natural 
classes of observational processes correspond exactly to those processes that will result in 
observations that are coarsened completely at random (ccar) - a strengthened version of 
car that often has been considered an unrealistically strong assumption. 

2. Fundamentals of Coarse Data and Ignorability 

There exist numerous definitions in the literature of what it means that data is missing or 
coarsened at random (Rubin, 1976; Dawid &: Dickey, 1977; Heitjan & Rubin, 1991; Heitjan, 
1994, 1997; Gih et al., 1997; Griinwald & Halpern, 2003). While all capture the same 
basic principle, various definitions are subtly different in a way that can substantially affect 
their implications. In Section 2.1 we give a fairly comprehensive overview of the variant 
definitions, and analyze their relationships. In this survey we aim at providing a uniform 
framework and terminology for different mar /car variants. Definitions are attributed to 
those earlier sources where their basic content has first appeared, even though our definitions 
and our terminology can differ in some details from the original versions (cf. also the remarks 
at the end of Section 2.1). 
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Special emphasis is placed on the distinction between distributional and coarsening 
variable induced versions of car. In this paper the main focus will then be on distributional 
versions. In Section 2.2 wc summarize results showing that distributional car is sufficient 
to establish ignorability for probabilistic inference. 

2.1 Defining Car 

We begin with the concepts introduced by Rubin (1976) for the special case of data with 
missing values. Assume that we are concerned with a multivariate random variable X = 
{Xi, . . . , Xk), where each Xi takes values in a finite state space Vi. Observations of X are 
incomplete, i.e. we observe values y = {yi, . . . ,yk), where each yj can be either the value 
Xi G Vi oi Xi, or the special 'missingness symbol' *. One can view y as the realization of 
a random variable Y that is a function of X and a missingness indicator, M, which is a 
random variable with values in {0, 1}*^: 

Y = fiX,M), (1) 

where y = f{x, m) is defined by 

( Xi if mi = , . 

= 1 * if mi = 1 • 

Rubin's (1976) original definition of missing at random is a condition on the conditional 
distribution of M: the data is missing at random iff for all y and all m: 

P{M = m\ X) is constant on {a; | P{X = x) > 0, f{x, m) = y}. (3) 

We refer to this condition as the M-mar condition, to indicate the fact that it is expressed 
in terms of the missingness indicator M. 

Example 2.1 Let X = {Xi,X2) with Vi = V2 = {p,n}. We interpret Xi,X2 as two 
medical tests with possible outcomes positive or negative. Suppose that test Xi always is 
performed first on a patient, and that test X2 is performed if and only if Xi comes out 
positive. Possible observations that can be made then are 

(n, *) = /((n, n), (0, 1)) = f{{n,p), (0, 1)), 
(P,n) =/((p,n),(0,0)), 
(p,p) = /((p,p),(0,0)). 

For y = {n, *) and m = (0, 1) we obtain 

P{M = m\ X = {n,n)) = P{M = m\X = {n,p)) = 1, 

so that (3) is satisfied. For other values of y and m condition (3) trivially holds, because 
the sets of x-values in (3) then are singletons (or empty). 

We can also eliminate the random vector M from the definition of mar, and formulate 
a definition directly terms of the joint distribution of Y and X. For this, observe that each 
observed y can be identified with the set 

U{y) := {x I for all i : pi ^ * => Xi = yi}. (4) 



891 



Jaeger 



The set U{y) contains the complete data values consistent with the observed y. We can 
now rephrase M-mar as 

P{Y = y\X) is constant on {x \ P{X = x)>0,x e U{y)}. (5) 

We call this the distributional mar condition, abbreviated d-mar, because it is in terms of 
the joint distribution of the complete data X, and the observed data Y. 

Example 2.2 (continued from Example 2.1) We have 

C/((n,*)) = {(n,n),(n,p)}, U{(j>,n)) = {(jp,n)], U{(jp,p)) = {(jp,p)}. 

Now we compute 

P{Y = {n,*)\X = (n, n))) = P{Y = {n,*)\X = {n,p))) = 1. 

Together with the (again trivial) conditions for the two other possible Y -values, this shows 
(5). 

M-mar and d-mar arc equivalent, because given X there is a one-to-one correspondence 
between M and Y, i.e. there exists a function h such that for all x,y with x € U{y): 

y = f{x,m) ^ m = h{y) (6) 

{h simply translates y into a {0, 1}- vector by replacing occurrences of * with 1, and all other 
values in y with 0). Using (6) one can easily derive a one-to-one correspondence between 
conditions (3) and (5), and hence obtain the equivalence of M-mar and d-mar. 

One advantage of M-mar is that it easily leads to the strengthened condition of missing 
completely at random (Rubin, 1976): 

P{M = m\X) is constant on {x \ P{X = x) > 0}. (7) 

We refer to this as the M-mcar condition. 

Example 2.3 (continued from Example 2.2) We obtain 

P{M = (0, 1) I X = (n,p)) = 1 7^ = P{M = (0, 1) | X = 

Thus, the observations here are not JVf-mcar. 

A distributional version of mcar is slightly more complex, and we defer its statement to 
the more general case of coarse data, which we now turn to. 

Missing attribute values are only one special way in which observations can be incom- 
plete. Other possibilities include imperfectly observed values (e.g. Xi is only known to be 
either x G or x' G V^), partly attributed values (e.g. ioi x ^ Vi = Vj it is only known 
that Xj = X or Xj = x), etc. In all cases, the incomplete observation of X defines the set 
of possible instantiations of X that are consistent with the observation. This leads to the 
general concept of coarse data (Heitjan & Rubin, 1991), which generalizes the concept of 
missing data to observations of arbitrary subsets of the state space. In this general setting 
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it is convenient to abstract from the particular structure of the state space as a product 
x^^-^Vi induced by a multivariate X, and instead just assume a univariate random variable 
X taking values in a set W = {xi, . . . (of course, this does not preclude the possibility 
that in fact W = x'^^^Vi). Abstracting from the missingness indicator M, one can imagine 
coarse data as being produced by X and a coarsening variable G. Again, one can also take 
the coarsening variable G out of the picture, and model coarse data directly as the joint 
distribution of X and a random variable Y (the observed data) with values in 2^ . This is 
the view we will mostly adopt, and therefore the motivation for the following definition. 

Definition 2.4 Let W = {xi, . . . , x„}. The coarse data space for W is 

^{W) := {{x, U)\xeW,U CW : xeU}. 

A coarse data distribution is any probability distribution P on ^1{W). 

A coarse data distribution can be seen as the joint distribution P{X, Y) of a random 
variable X with values in W, and a random variable Y with values in 2^ \ 0. The joint 
distribution of X and Y is constrained by the condition X e Y. Note that, thus, coarse 
data spaces and coarse data distributions actually represent both the true complete data 
and its coarsened observation. In the remainder of this paper, P without any arguments 
will always denote a coarse data distribution in the sense of Definition 2.4, and can be 
used interchangeably with P{X, Y). When we need to refer to (joint) distributions of other 
random variables, then these are listed explicitly as arguments of P. E.g.: P{X, G) is the 
joint distribution of X and G. 

Coarsening variables as introduced by the following definition are a means for specifying 
the conditional distribution of Y given X. 

Definition 2.5 Let G be a random variable with values in a finite state space T, and 

/ : W X r ^ 2^ \ 0, (8) 

such that 

• for all X with P{X = x) > 0.' x € f{x,g); 

• for all X, x' with P{X = x) > 0, P{X = x') > 0, all J7 G 2^ \ 0, and all g G T; 

f{x,g) = U,x' eU fix',g) = U. (9) 

We call the pair {G,f) a coarsening variable for X. Often we also refer to G alone as a 
coarsening variable, in which case the function f is assumed to be implicitly given. 

A coarse data distribution P is induced by X and (G, /) if P is the joint distribution 
ofX andf{X,G). 

The condition (9) has not always been made explicit in the introduction of coarsening 
variables. However, as noted by Heitjan (1997), it is usually implied in the concept of 
a coarsening variable. GvLR (pp. 283-285) consider a somewhat more general setup in 
which f{x,g) does not take values in 2*^ directly, but y = f{x,g) is some observable 



893 



Jaeger 



from which U = a{y) C W is obtained via a further mapping a. The introduction of 
such an intermediate observable Y is necessary, for example, when dealing with real-valued 
random variables X. Since we then will not have any statistically tractable models for 
general distributions on 2^*, a parameterization Y for a small subset of 2"^ is needed. For 
example, Y could take values in M x R, and 0(2/1,2/2) might be defined as the interval 
[min{yi, 2/2}, max{2/i, 2/2}]- GvLR do not require (9) in general; instead they call / Cartesian 
when (9) is satisfied. 

The following definition generalizes property (6) of missingness indicators to arbitrary 
coarsening variables. 

Definition 2.6 The coarsening variable {G, f ) is called invertible if there exists a function 

/i : 2^ \ ^ r, (10) 

such that for all x, U with x £ U, and all g G F; 

U = fix,g) ^ g = h{U). (11) 

An alternative reading of (11) is that G is observable: from the coarse observation U 
the value 5 G F can be reconstructed, so that G can be treated as a fully observable random 
variable. 

We can now generalize the definition of missing (completely) at random to the coarse 
data setting. We begin with the generalization of M-mar. 

Definition 2.7 (Heitjan, 1997) LetG be a coarsening variable for X . The joint distribution 
P{X, G) is G-car if for all U CW, and g eV: 

P{G = g\ X) is constant on {x \ P{X = x) > 0, f{x, g) = U}. (12) 

By marginalizing out the coarsening variable G (or by not assuming a variable G in the 
first place), we obtain the following distributional version of car. 

Definition 2.8 (Heitjan &; Rubin, 1991) Let P be a coarse data distribution. P is d-cax if 
for allU CW 

P{Y = U\X) is constant on {x \ P{X = x)>0,xe U}. (13) 

If X is multivariate, and incompleteness of observations consists of missing values, then 
d-car coincides with d-mar, and M-car with M-mar. 

Condition (12) refers to the joint distribution of X and G, condition (13) to the joint 
distribution of X and Y. Since y is a function of X and G, one can always determine 
from the joint distribution of X and G whether d-car holds for their induced coarse data 
distribution. Conversely, when only the coarse data distribution P{X, Y) and a coarsening 
variable G inducing P{X, Y) are given, it is in general not possible to determine whether 
P{X,G) is G-car, because the joint distribution P{X,G) cannot be reconstructed from 
the given information. However, under suitable assumptions on G it is possible to infer 
that P{X,G) is G-car only from the induced P{X,Y) being d-car. With the following 
two theorems we clarify these relationships between G-car and d-car. These theorems are 
essentially restatements in our conceptual framework of results already given by GvLR (pp. 
284-285). 
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Theorem 2.9 A coarse data distribution P{X, Y) is d-car iff there exists a coarsening 
variable G inducing P{X,Y), such that P{X,G) is G-car. 

Proof: First assume that P{X,Y) is d-car. We construct a canonical coarsening variable 
a inducing P{X, Y) as follows: let T = 2^^ \ and /(x, U) := U for all x e W and U eV. 
Define a F-valued coarsening variable G by P{G = U \ X = x) := P{Y = U \ X = x). 
Clearly, the coarse data distribution induced by G is the original P{X, Y), and P{X, G) is 
G-car. 

Conversely, assume that P{X,G) is G-car for some G inducing P{X,Y). Let U C W, 
x & U. Then 

P{Y = U\X = x) = P{Ge{ger\f{x,g) = U}\X = x) 

P{G = g\X = x). 

9&:}{x,g)=U 

Because of (9) the summation here is over the same values 5 € T for all x G U. Because 
of G-car, the conditional probabilities P{G = g \ X = x) are constant for x G U. Thus 
P{Y = U \ X = x) is constant for x G U, i.e. d-car holds. □ 

The following example shows that d-car docs not in general imply G-car, and that a 
fixed coarse data distribution P{X, Y) can be induced both by a coarsening variable for 
which G-car holds, and by another coarsening variable for which G-car does not hold. 

Example 2.10 (continued from Example 2.3) We have already seen that the coarse data 
distribution here is d-mar and M-raar, and hence d-car and M-cai. 

M is not the only coarsening variable inducing P{X,Y). In fact, it is not even the 
simplest: let G\ be a trivial random variable that can only assume one state, i.e. Ti = {5}. 
Define fi by 

fi{{n,n),g) = fi{{n,p),g) = {(n, n), (n,p)}, 

fi{{p,n),g) = {{p,n)}, 

fi{{p,p),g) = {{p,p)}- 

Then Gi induces P{X,Y), and P{X,Gi) also is trivially G-car. 

Finally, let G2 be defined by r2 = {51,52} o,nd f2{x,gi) = fi{x,g) for all x € W and 
i = 1,2. Thus, G2 is just like Gi, but the trivial state space of Gi has been split into two 
elements with identical meaning. Let the conditional distribution of G2 given X be 

P{G2 = gi\X = (n,n)) = P{G2 = g2 \ X = (n,p)) = 2/3, 

P{G2 =g2\X = {n, n)) = P{G2 = gi\X = {n,p)) = 1/3, 
P{G2 =gi\X = ip,n)) = P{G2 = gi\X = {p,p)) = 1. 

Again, G2 induces P{X,Y). However, P(X, G2) is not G-car, because 

f2{{n,n),gi) = f2{in,p),gi) = {(n,n), {n,p)}, 
P{G2 = 51 I X = (n, n)) ^ P{G2 = gi \ X = {n,p)) 

violates the G-car condition. G2 is not invertible in the sense of Definition 2.6: when, for 
example, U = {{n,n), {n,p)} is observed, it is not possible to determine whether the value 
0/G2 was gi or 52- 
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The following theorem shows that the non-invertibility of G2 in the preceding example 
is the reason why we cannot deduce G-car for G2) from the d-car property of the 

induced P{X,Y). This theorem completes our picture of the G-car/ d-car relationship. 

Theorem 2.11 Let P{X,Y) be a coarse data distribution, G an invertible coarsening vari- 
able inducing P{X,Y). If P{X,Y) is d-car, then P{X,G) is G-car. 

Proof: LetU QW, g e T, and x gU, such that P{X = x) > and f{x,g) = U. Since G 
is invertible, we have that f{x,g') ^ U for all g' 7^ g, and hence 

P{G = g\X = x) = P{Y = U\X = x). 

Prom the assumption that P is d-car it follows that the right-hand probability is constant 
for X eU, and hence the same holds for the left-hand side, i.e. G-car holds. □ 

We now turn to coarsening completely at random (ccar). It is straightforward to gen- 
eralize the definition of M-mcar to general coarsening variables: 

Definition 2.12 (Heitjan, 1994) Let G be a coarsening variable for X . The joint distribu- 
tion P{X, G) is G-ccar if for all g eV 

P{G = g\X) is constant on {x \ P{X = x) > 0}. (14) 

A distributional version of ccar does not seem to have been formalized previously in the 
literature. GvLR refer to coarsening completely at random, but do not provide a formal 
definition. However, it is implicit in their discussion that they have in mind a slightly 
restricted version of our following definition (the restriction being a limitation to the case 
k = \ va. Theorem 2.14 below). 

We first observe that one cannot give a definition of d-ccar as a variant of Definition 2.12 
in the same way as Definition 2.8 varies Definition 2.7, because that would lead us to the 
condition that P{Y = U | X) is constant on {x \ P{X = x) > 0}. This would be inconsistent 
with the existence of x W \ U with P{X = x) > 0. However, the real semantic core of 
d-car, arguably, is not so much captured by Definition 2.8, as by the characterization given 
in Theorem 2.9. For d-ccar, therefore, we make an analogous characterization the basis of 
the definition: 

Definition 2.13 A coarse data distribution P(X,Y) is d-ccar iff there exists a coarsening 
variable G inducing P{X,Y), such that P{X,G) is G-ccar. 

The following theorem provides a constructive characterization of d-ccar. 

Theorem 2.14 A coarse data distribution P{X,Y) is d-ccar iff there exists a family {Wi, 
. . . , Wk} of partitions of W , and a probability distribution (Ai, . . . , Afc) on (Wi, . . . , Wfe), 
such that for all x eW with P{X = x) > 0: 

P{Y = U\X = x)= J2 ^i- (15) 

i€l,...,k 

xeUeWi 
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G-ccar 
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Figure 1: Versions of car. (a): there exists G such that this imphcation holds; (b): for all 
invertible G this implication holds; (c): equivalence holds for G = M. 



Proof: Assume that P is d-ccar. Let G be a coarsening variable inducing P{X,Y), such 
that P{X,G) is G-ccar. Because of (9), each value gi & T induces a partition Wj = 
{i7i^i, . . . , C/j^fc(j)}, such that f{x,gi) = Uij ^ x £ Uij. The partitions Wj together with 
Aj := P{G = Qi I X) then provide a representation of P{X,Y) in the form (15). 

Conversely, if P{X, Y) is given by (15) via partitions Wi, . . . , Wfe and parameters Aj, 
one defines a coarsening variable G with T = {1, . . . ,k}, P{G = gi \ X = x) = \i for all x 
with P{X = x) > 0, and f{x,i) as that U e Wi containing x. P{X,G) then is G-ccar and 
induces P{X,Y), and hence P{X,Y) is d-ccar. □ 

As before, we have that the G-ccar property of P{X, G) cannot be determined from the 
induced coarse data distribution: 



Example 2.15 (continuation of Exam,ple 2.10) P{X,Y) is d-ccar and induced by any of 
the three coarsening variables M, Gi, G2. However, P{X,Gi) is G-ccar, while P{X,M) 
and P{X, G2) are not. 

The previous example also shows that no analog of Theorem 2.11 holds for ccar: M 
is invertible, but from d-ccar for the induced P{X,Y) we here cannot infer G-ccar for 
P{X,M). 

Figure 1 summarizes the different versions of mar /car we have considered. The dis- 
tributional versions d-car and d-ccar are weaker than their M- and G- counterparts, and 
therefore the less restrictive assumptions. At the same time, they are sufficient to establish 
ignorability for most statistical learning and probabilistic inference tasks. For the case of 
probabilistic inference this will be detailed by Theorem 2.18 in the following section. For 
statistical inference problems, too, the required ignorability results can be obtained from 
the distributional car versions, unless a specific coarsening variable is explicitly part of the 
inference problem. Whenever a coarsening variable G is introduced only as an artificial 
construct for modeling the connection between incomplete observations and complete data, 
one must be aware that the G-car and G-ccar conditions can be unnecessarily restrictive, 
and may lead us to reject ignorability when, in fact, ignorability holds (cf. Examples 2.3 
and 2.15). 
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We conclude this section with three additional important remarks on definitions of car, 
which are needed to complete the picture of different approaches to car in the literature: 

Remark 1: All of the definitions given here are weak versions of mar /car. Corre- 
sponding strong versions are obtained by dropping the restriction P(X = x) > from 
(3), (5), (7), (12), (13), (14), respectively (15). Differences between weak and strong versions 
of car are studied in previous work (Jaeger, 2005). The results there obtained indicate that 
in the context of probability updating the weak versions are more suitable. For this reason 
we do not go into the details of strong versions here. 

Remark 2: Our definitions of car differ from those originally given by Rubin and Heit- 
jan in that our definitions are "global" definitions that view mar /car as a property of a 
joint distribution of complete and coarse data. The original definitions, on the other hand, 
are conditional on a single observation Y = U, and do not impose constraints on the joint 
distribution of X and Y for other values of Y. These "local" mar /car assumptions are 
all that is required to justify the application of certain probabilistic or statistical inference 
techniques to the single observation Y = U. The global mar /car conditions we stated jus- 
tify these inference techniques as general strategies that would be applied to any possible 
observation. Local versions of car are more natural under a Bayesian statistical philosophy, 
whereas global versions are required under a frequentist interpretation. Global versions of 
car have also been used in other works (e.g., Jacobsen &; Keiding, 1995; Gill et al., 1997; 
Nielsen, 1997; Cator, 2004). 

Remark 3: The definitions and results stated here are strictly limited to the case of 
finite W . As already indicated in the discussion following Definition 2.5, extensions of car 
to more general state spaces C typically require a setup in which observations are modeled 
by a random variable taking values in a more manageable state space than 2^ . Several such 
formalizations of car for continuous state spaces have been investigated (e.g., Jacobsen k. 
Keiding, 1995; Gill et al., 1997; Nielsen, 2000; Cator, 2004). 

2.2 Ignor ability 

Car and mar assumptions are needed for ignoring the distinction between "?7 is observed" 
and "J7 has occurred" in statistical inference and probability updating. In statistical in- 
ference, for example, d-car is required to justify likelihood maximizing techniques like the 
EM algorithm (Dempster, Laird, &: Rubin, 1977) for learning from incomplete data. In this 
paper the emphasis is on probability updating. We therefore briefly review the significance 
of car in this context. We use the well-known Monty Hall problem. 

Example 2.16 A contestant at a game show is asked to choose one from three closed doors 
A, B, C, behind one of which is hidden a valuable prize, the others each hiding a goat. The 
contestant chooses door A, say. The host now opens door B, revealing a goat. At this point 

the contestant is allowed to change her choice from A to C . Would this be advantageous? 

Being a savvy probabilistic reasoner, the contestant knows that she should analyze the 
situation using the coarse data space Vl{{A,B.,C}), and compute the probabilities 

P{X = A\Y = {A, C}), P{X = C\ Y = {A, C}). 
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She makes the following assumptions: 1. A-priori all doors are equally likely to hide the 
prize. 2. Independent from the contestants choice, the host will always open one door. 3. 
The host will never open the door chosen by the contestant. 4- The host will never open 
the door hiding the prize. 5. If more than one possible door remain for the host, he will 
determine by a fair coin flip which one to open. From this, the contestant first obtains 

P{Y = {A,C}\X = A) = 1/2, P{Y = {A,C}\X = C) = 1, (16) 

and then 

P{X = A\Y = {A,C}) = 1/3, P{X = C \ Y = {A,C}) = 2/3. 

The conclusion, thus, is that it will be advantageous to switch to door C. A different 
conclusion is obtained by simply conditioning in the state space W on "{A, C} has occurred": 

P{X = A \ X e {A,C}) = 1/2, P{X = C\X e{A,C}) = 1/2. 

Example 2.17 Consider a similar situation as in the previous example, but now assume 
that just after the contestant has decided for herself that she would pick door A, but before 
communicating her choice to the host, the host says "let me make things a little easier for 
you", opens door B, and reveals a goat. Would changing from A to C now be advantageous? 

The contestant performs a similar analysis as before, but now based on the following 
assumptions: 1. A-priori all doors are equally likely to hide the prize. 2. The host's decision 
to open a door was independent from the location of the prize. 3. Given his decision to open 
a door, the host chose by a fair coin flip one of the two doors not hiding the prize. Now 

P{Y = {A,C} \ X = A) = P{Y = {A,C}\X = C), (17) 

and hence 

P{X = A\Y = {A,C}) = 1/2, P{X = C \ Y = {A,C}) = 1/2. 
In particular here 

P{X = A\Y = {A,C}) = P{X = A\X^{A, C}) 
P{X = C \ Y = {A,C}) = P{X = C\X e{A, C}), 

i.e. the difference between "{A,C} is observed" and "{A,C} has occurred" can be ignored 
for probability updating. 

The coarse data distribution in Example 2.16 is not d-car (as evidenced by (16)), whereas 

the coarse data distribution in Example 2.17 is d-car (as shown, in part, by (17)). The 
connection between ignorability in probability updating and the d-car assumption has been 
shown in GvLR and GH. The following theorem restates this connection in our terminology. 

Theorem 2.18 Let P be a coarse data distribution. The following are equivalent: 

(i) P is d-car. 

(ii) For allx eW, U CW with x e U and P{Y = U) > Q: 

P{X = x\Y = U)= P{X = X I X G [/). 
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(iii) For allx eW, U CW with xeU and P{X = x) > 0; 

P{Y = U) 



P{Y = U\X = x) 



P{X e U) ' 



The equivalence (i)<^(ii) is shown in GH, based on GvLR. For the equivalence with (iii) 
see (Jaeger, 2005). 



3. Criteria for Car and Ccar 

Given a coarse data distribution P it is, in principle, easy to determine whether P is d-car 
(d-ccar) based on Definition 2.8, respectively Theorem 2.14 (though in case of d-ccar a 
test might require a search over possible families of partitions). However, typically P is 
not completely known. Instead, we usually have some partial information about P. In the 
case of statistical inference problems this information consists of a sample Ui,. . . ,Un of 
the coarse data variable Y. In the case of conditional probabilistic inference, we know the 
marginal of P on W. In both cases we would like to decide whether the partial knowledge 
of P that we possess, in conjunction with certain other assumptions on the structure of 
P that we want to make, is consistent with d-car, respectively d-ccar, i.e. whether there 
exists a distribution P that is d-car (d-ccar), and satisfies our partial knowledge and our 
additional assumptions. 

In statistical problems, additional assumptions on P usually come in the form of a 
parametric representation of the distribution of X. When X = (Xi, ... , X^) is multivariate, 
such a parametric representation can consist, for example, in a factorization of the joint 
distribution of the Xi, as induced by certain conditional independence assumptions. In 
probabilistic inference problems an analysis of the evidence gathering process can lead to 
assumptions about the likelihoods of possible observations. In all these cases, one has to 
determine whether the constraints imposed on P by the partial knowledge and assumptions 
are consistent with the constraints imposed by the d-car assumption. In general, this will 
lead to computationally very difficult optimization or constraint satisfaction problems. 

Like GH, we will focus in this section on a rather idealized special problem within this 
wider area, and consider the case where our constraints on P only establish what values the 
variables X and Y can assume with nonzero probability, i.e. the constraints on P consist 
of prescribed sets of support for X and Y. We can interpret this special case as a reduced 
form of a more specific statistical setting, by assuming that the observed sample Ui,. . . ,Un 
only is used to infer what observations are possible, and that the parametric model for 
X, too, only is used to determine what x G W have nonzero probabilities. Similarly, 
in the probabilistic inference setting, this special case occurs when the knowledge of the 
distribution of X only is used to identify the x with P{X = x) > 0, and assumptions on 
the evidence generation only pertain to the set of possible observations. 

GH represent a specific support structure of P in form of a 0, 1-matrix, which they 
call the "CARacterizing matrix". In the following definition we provide an equivalent, but 
different, encoding of support structures of P. 

Definition 3.1 A support hypergraph (for a given coarse data space Q,{W) ) is a hypergraph 
of the form [M, W), where 
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• J\f C 2^ \ «s the set of nodes, 

• W' C W is the set of edges, such that each edge x G W just contains the nodes 

{U eM\xeU}. 

{Af,W') is called the support hypergraph of the distribution P on il{W) iff M = {U C 
2^^ \ I P{Y = U) > 0}, and W = {x & W \ P{X = x) > 0}. A support hypergraph is 
car-compatible iff it is the support hypergraph of some d-cax distribution P. 




Figure 2: Support hypergraphs for Examples 2.16 and 2.17 

Example 3.2 Figure 2 (a) shows the support hypergraph of the coarse data distribution in 
Example 2.16; (b) for Example 2.17. 

The definition of support hypergraph may appear strange, as a much more natural def- 
inition would take the states x & W with P{X = x) > as the nodes, and the observations 
U C W SiS the edges. The support hypergraph of Definition 3.1 is just the dual of this 
natural support hypergraph. It turns out that these duals are more useful for the purpose 
of our analysis. 

A support hypergraph can contain multiple edges containing the same nodes. This 
corresponds to multiple states that are not distinguished by any of the possible observations. 
Similarly, a support hypergraph can contain multiple nodes that belong to exactly the same 
edges. This corresponds to different observations U, U' with U D {x \ P{X = x) > 0} = 
U' n {x I P(X = x) > 0}. On the other hand, a support hypergraph cannot contain any 
node that is not contained in at least one edge (this would correspond to an observation U 
with P{Y = U) > but P{X G [/) = 0). Similarly, it cannot contain empty edges. These 
are the only restrictions on support hypergraphs: 

Theorem 3.3 A hypergraph {M, £) with finite J\f and £ is the support hypergraph of some 
distribution P, iff each node in M is contained in at least one edge from £, and all edges 
are nonempty. 

Proof: Let W = £ and define P{X = x) = 1/ \ £\ for all a; G W. For each node n E M let 
U{n) be {x eW \ n ^ x} (nonempty!), and define P{Y = U{n) | X = x) = 1/ | x |. Then 
(A/", £) is the support hypergraph of P. □ 
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While (almost) every hypergraph, thus, can be the support hypergraph of some distribu- 
tion, only rather special hypergraphs can be the support hypergraphs of a d-car distribution. 
Our goal, now, is to characterize these car-compatible support hypergraphs. The following 
proposition gives a first such characterization. It is similar to lemma 4.3 in GH. 

Proposition 3.4 The support hypergraph {J\f, W) is csn -compatible iff there exists a func- 
tion v : Af ^ {0,1], such that for all x G W' 



Proof: First note that in this proposition we are looking at x and U as edges and nodes, 
respectively, of the support hypergraph, so that writing U E x makes sense, and means the 
same as x e U when x and U are seen as states, respectively sets of states, in the coarse 
data space. 

Suppose {M, W) is the support hypergraph of a d-car distribution P. It follows from 
Lemma 2.18 that iy{U) := P{Y = U)/P{X G U) defines a function u with the required 

property. Conversely, assume that v is given. Let P{X) be any distribution on W with 
support W. Setting P{Y = U \ X = x) := v{U) for all U M , and x eW r\U extends P 
to a d-car distribution whose support hypergraph is just (A/", W). □ 

Corollary 3.5 // the support hypergraph contains (properly) nested edges, then it is not 
car -compatible. 

Example 3.6 The support hypergraph from, Example 2.16 contains nested edges. Without 
any numerical computations, it thus follows alone from the qualitative analysis of what 
observations could have been made, that the coarse data distribution is not d-car, and hence 
conditioning is not a valid update strategy. 

The proof of Proposition 3.4 shows (as already observed by GH) that if a support 
hypergraph is car-compatible, then it is car-compatible for any given distribution P{X) 
with support W', i.e. the support assumptions encoded in the hypergraph, together with the 
d-car assumption (if jointly consistent), do not impose any constraints on the distribution 
of X (other than having the prescribed set of support). The same is not true for the 
marginal of Y: for a car-compatible support hypergraph (J\f, W') there will usually also 
exist distributions P{Y) on J\f such that P{Y) cannot be extended to a d-car distribution 
with the support structure specified by the hypergraph {J\f,W'). 

Proposition 3.4 already provides a complete characterization of car-compatible support 
hypergraphs, and can be used as the basis of a decision procedure for car-compatibility 
using methods for linear constraint satisfaction. However, Proposition 3.4 does not provide 
very much real insight into what makes an evidence hypergraph car-compatible. Much 
more intuitive insight is provided by Corollary 3.5. The criterion provided by Corollary 3.5 
is not complete: as the following example shows, there exist support hypergraphs without 
nested edges that are not car-compatible. 





(18) 



ueM:Uex 
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Xi X2 




X5 

Figure 3: car-incompatible support hypergraph without nested edges 

Example 3.7 Let {M, W) be as shown in Figure 3. By assuming the existence of a suitable 
functions, and summing (18) once over xi and X2, and once over X3,X4,X5, we obtain the 
contradiction 2 = Yli=i ^i^i) — 3- Thus, {M, W) is not cax- compatible. 

We now proceed to extend the partial characterization of car-compatibility provided by 
Corollary 3.5 to a complete characterization. Our following result improves on theorem 4.4 
of GH by giving a necessary and sufficient condition for car-compatibility, rather than just 
several necessary ones, and, arguably, by providing a criterion that is more intuitive and 
easier to apply. Our characterization is based on the following definition. 

Definition 3.8 Let {N^W) be a support hypergraph. Let x = xi,...,Xk be a finite se- 
quence of edges from W' , possibly containing repetitions of the same edge. Denote the 
length k of the sequence by \ x\. For x eW we denote with Ix the indicator function on M 
induced by x, i.e. 

The function IxiP) := ^^ex^^i^) then counts the number of edges in x that contain U. 
For two sequences x,x' we write Ix < l^' iff^x{U) < ^x'iU) for all U. 

Example 3.9 For the evidence hypergraph in Figure 3 we have that '^{xi,x2) — I{a;3,x4,x5) 
the function on M which is constant 1. 

For x = (xi, ^3, X4, xs) one obtains lx{U) = 2 for U = Ui,U2,U3, and lx{U) = 1 for 
U = U4,U5,Uq. The same function also is defined by x = {xi,xi,X2). 

In any evidence hypergraph, one has that for two single edges x,x': 1^ < Ix' iff x is a 
proper subset of x' . 

We now obtain the following characterization (which is partly inspired by known con- 
ditions for the existence of finitely additive measures, see Bhaskara Rao & Bhaskara Rao, 
1983): 

Theorem 3.10 The support hypergraph {Af,W') is car -compatible iff for every two se- 
quences X, x' of edges from W' we have 

= ^x' =^ \x\ = \x'\,and (19) 
Ix < 7^ Ix' ^ |a;|<|a;'|. (20) 
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Proof: Denote k ■.=\W'\, I :=\Af\. Let A = (ajj) be the incidence matrix of {J\f,W'), i.e. 
A is an A: X / matrix with aij = 1 if Uj G Xi, and Ojj = if Uj ^ Xi (using some indexings 
i,j for W' and TV). 

Condition (3.4) now reads: 

exists u G (0, 1]' with Av = 1 (21) 

(here 1 is a vector of k ones). An edge indicator function l^. can be represented as a row 
vector 2: € N'^, where Zi is the number of times Xi occurs in x. Then Ix can be written as 
the row vector zA, and the conditions of Theorem 3.10 become: for all z,z' G N'^: 

zA = z'A =^ z-l = z' 1, (22) 
zA < z'A, zAj^z'A^ z Kz' 1. (23) 

Subtracting right sides, this is equivalent to: for all z eZ'': 

zA = =4> z l = 0, (24) 
zA<0, zA / ^ z • 1< 0. (25) 

Using Farkas's lemma (see e.g. Schrijver, 1986, Section 7.3), one now obtains that condi- 
tions (24) and (25) are necessary and sufficient for (21). For the application of Farkas's 
lemma to our particular setting one has to observe that since A and 1 are rational, it is 
sufficient to have (24) and (25) for rational z (cf. Schrijver, 1986[p.85]). This, in turn, is 
equivalent to having (24) and (25) for integer z. The strict positivity of the solution 1/ can 
be derived from conditions (24) and (25) by analogous arguments as for Corollary 7.1k in 
(Schrijver, 1986). □ 



Example 3.11 Prom Example 3.9 we immediately obtain that nested edges x,x' violate 
(20), and hence we again obtain Corollary 3.5. Also the sequences (xi,X2) and (x3,X4,X5) of 
the support hypergraph in Figure 3 violate (19), so that we again obtain the c&i -incompatibili- 
ty of that hypergraph. 




(a) (b-i) (b-ii) (c) (d) 

Figure 4: Car-compatible support hypergraphs with three nodes 



Example 3.12 GH (Example 4-6) derive a complete characterization of cax- compatibility 
for the case that exactly three different observations can be made with positive probability. In 

our framework this amounts to finding all support hypergraphs with three nodes that satisfy 
the conditions of Theorem, 3.10. The possible solutions are shown in Figure 4 (omitting 
equivalent solutions obtained by duplicating edges). The labeling (a)-(d) of the solutions 
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corresponds to the case enumeration in GH. It is easy to verify that the shown support 
hypergraphs all satisfy (19) and (20). That these are the only such hypergraphs follows from 
the facts that adding a new edge to any of the shown hypergraphs either leads to a hypergraph 
already on our list (only possible for the pair (b-i) and (h-ii)), or introduces a pair of nested 
edges. Similarly, deleting any edge either leads to a hypergraph already shown, or to an 
invalid hypergraph in which not all nodes are covered. 

4. Procedural Models 

So far we have emphasized the distributional perspective on car. We have tried to iden- 
tify car from the joint distribution of complete and coarse data. From this point of view 
coarsening variables are an artificial construct that is introduced to describe the joint dis- 
tribution. In some cases, however, a coarsening variable can also model an actual physical, 
stochastic process that leads to the data coarsening. In such cases, the analysis should 
obviously take this concrete model for the underlying coarsening process into account. In 
this section we study d-car distributions in terms of such procedural models for the data 
generating mechanism. Our results in this section extend previous investigations of car 
mechanisms in GvLR and GH. 

Our first goal in this section is to determine canonical procedural models for coarsen- 
ing mechanisms leading to d-car data. Such canonical models can be used in practice for 
evaluating whether a d-car assumption is warranted for a particular data set under inves- 
tigation by matching the (partially known or hypothesized) coarsening mechanism of the 
data against any of the canonical models. Our investigation will then focus on properties 
that one may expect reasonable or natural procedural models to possess. These properties 
will be captured in two formal conditions of honesty and robustness. The analysis of these 
conditions will provide new strong support for the d-ccar assumption. 

The following definition of a procedural model is essentially a generalization of coarsening 
variables, obtained by omitting condition (9), and by replacing the single variable G with 
a (potentially infinite) sequence G. 

Definition 4.1 Let P be a coarse data distribution on ^{W). A procedural model for P 
is given by 

• A random variable X distributed according to the marginal of P on W. 

• A finite or infinite sequence G = Gi, G2, . . . , of random variables, such that Gi takes 
values in a finite set Ti (i>l ). 

• A function / : x XjFj — >■ 2^ \ 0, such that {X,f{X,G)) is distributed according 
to P. 

We also call a procedural model (X, G, f) a car model {ccar model), if the coarse data 
distribution P it defines is d-car (d-ccar). In the following we denote XiFj with T. 

Some natural coarsening processes are modeled with real-valued coarsening variables 
(e.g. censoring times, Heitjan & Rubin, 1991). We can accommodate real-valued variables 
Z in our framework by identifying Z with a sequence of binary random variables Zi (i > 1) 
defining its binary representation. A sequence G containing a continuous Z = Gi can then 
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be replaced with a sequence G' in which the original variables Gj {j ^ i) are interleaved 
with the binary Zj. 

In the following, we will not discuss measurability issues in detail (only in Appendix A 
will a small amount of measure theory be needed). It should be mentioned, however, that it 
is always assumed that W and the Fj are equipped with a cr-algebra equal to their powerset, 
that on X r we have the generated product cr-algebra, and that f~^{U) is measurable 
for all U CW. 



/2.16 


A 


B 


C 


/2.17 


A 


B 


c 


h 


{AC} 


{A,B} 


{AC} 


h 


{AC} 


{B,C} 


{B,C} 


t 


{A,B} 


{AB} 


{AC} 


t 


{AB} 


{AB} 


{AC} 



Table 1: Procedural models for Examples 2.16 and 2.17 



Example 4.2 Natural procedural models for the coarse data distributions of Examples 2.16 
and 2.17 are constructed by letting G = F represent the coin flip that determines the door 
to be opened. Suppose that in both examples the host has the following rule for matching 
the result of the coin flip with potential doors for opening: when the coin comes up heads, 
then the host opens the door that is first in alphabetical order among all doors (two, at 
most) that his rules permit him to open. If the coin comes up tails, he opens the door last 
in alphabetical order. This is formally represented by a procedural model in which X is 
a {A, B,C} -valued, uniformly distributed random variable, G consists of a single {h,t}- 
valued, uniformly distributed random variable F, and X and F are independent. 

Table 1 com,pletes the specification of the procedural models by defining the functions 
/2.I6 o.nd /2.17 for the respective examples. Note that neither (F, /2.16) ^cr (F, /2.17) are 
coarsening variables in the sense of Definition 2.5, as e.g. A G f2.i6{B,h) f2.i6{A h), in 
violation of (9). 

The two procedures described in the preceding example appear quite similar, yet one 
produces a d-car distribution while the other does not. We are now interested in identifying 
classes of procedural models that are guaranteed to induce d-car (d-ccar) distributions. 
Conversely, for any given d-car distribution P, we would like to identify procedural models 
that might have induced P. We begin with a class of procedural models that stands in a 
trivial one-to-one correspondence with d-car distributions. 

Example 4.3 (^Direct car modelj Let X be a W -valued random variable, and G = Gi with 
Fi = 2^ \ 0. Let the joint distribution of X and Gi be such that P{G\ = U\ X = x) = {) 
when X ^ U, and 

P[Gi = U \ X = x) is constant on {x \ P{X = x) >0,x G U}. (26) 

Define f{x, U) = U. Procedural models of this form are just the coarsening variable rep- 
resentations of d-car distributions that we already encountered in Theorem 2.9. Hence, a 
coarse data distribution P is d-car iff' it is induced by a direct car model. 
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The direct car models are not much more than a restatement of the d-car definition. 
They do not help us very much in our endeavor to identify canonical observational or data- 
generating processes that will lead to d-car distributions, because the condition (26) does 
not correspond to an easily interpretable condition on an experimental setup. 

For d-ccar the situation is quite different: here a direct encoding of the d-ccar condition 
leads to a rather natural class of procedural models. The class of models described next 
could be called, in analogy to Example 4.3, "direct ccar models". Since the models here 
described permit a more natural interpretation, we give it a different name, however. 

Example 4.4 (^Multiple grouped data model, MGDJ Let X be a W -valued random vari- 
able. Let (Wi, . . . , Wfc) be a family of partitions of W (cf. Theorem 2.14)- Let G = G\, 
where Gi takes values in {1, ... , k} and is independent of X. Define /(x, i) as that U G Wj 
that contains x. Then {X,Gi,f) is ccar. Conversely, every d-ccar coarse data model is 
induced by such a multiple grouped data model. 

The multiple grouped data model corresponds exactly to the CARgen procedure of GH. 
It allows intuitive interpretations as representing procedures where one randomly selects one 
out of k different available sensors or tests, each of which will reveal the true value of X only 
up to the accuracy represented by the set U G Wj containing x. In the special case k = 1 this 
corresponds to grouped or censored data (Heitjan & Rubin, 1991). GH introduced CARgen 
as a procedure that is guaranteed to produce d-car distributions. They do not consider d- 
ccar, and therefore do not establish the exact correspondence between CARCEN and d-ccar. 
In a similar vein, GvLR introduced a general procedure for generating d-car distributions. 
The following example rephrases the construction of GvLR in our terminology. 

Example 4.5 (^Randomized monotone coarsening, RMCj Let X be a W -valued random 
variable. Let G = Hi, Si,H2,S2, ■ . . , Sn-i,Hn, where the Hi take values in 2^ , and the Si 
are {0, l}-valued. Define 

fr Hi ifXeHi 

\W\Hi ifX^Hi. 

Let the conditional distribution of Hi given X and Hi, . . . ,-ffi_i be concentrated on subsets 

ofn'fJiHj. 

This model represents a procedure where one successively refines a "current" coarse data 
set Ai := n^Tj\i?/i by selecting a random subset Hi of Ai and checking whether X E Hi or 
not, thus computing Hi and Ai+i ■ This process is continued until for the first time Si = 1 
(i.e. the Si represent stopping conditions). The result of the procedure, then is represented 
by the following function f : 

fix, (Hi, Si, . . .,Sn-i,H^)) = nS^'=i^'=='>^,. 

Finally, we impose the conditional independence condition that the distribution of the 
Hi, Si depend on X only through Hi, ... , Hi-i, i.e. 

P{Hi I X,Hi,. . . , Hi-i) = P{Hi I Hi, . . . , Hi-i) 
P{Si I X,Hi, . . . ,-ffi-i) = P{Si I Hi, . . . ,Hi-i). 



907 



Jaeger 



As shown in GvLR, an RMC model always generates a d-car distribTition, but not every 
d-car distribution can be obtained in this way. GH state that RMC models are a special case 
of CARgen models. As we will see below, CARgen and RMC are actually equivalent, and 
thus, both correspond exactly to d-ccar distributions. The distribution of Example 2.17 
is the standard example (already used in a slightly different form in GvLR) of a d-car 
distribution that cannot be generated by RMC or CARGEN. A question of considerable 
interest, then, is whether there exist natural procedural models that correspond exactly to 
d-car distributions. GvLR state that they "cannot conceive of a more general mechanism 
than a randomized monotone coarsening scheme for constructing the car mechanisms which 
one would expect to meet with in practice,. . ."(p. 267). GH, on the other hand, generalize 
the CARgen models to a class of models termed CARgen*, and show that these exactly 
comprise the models inducing d-car distributions. 

However, the exact extent to which CARgen* is more natural or reasonable than the 
trivial direct car models has not been formally characterized. We will discuss this issue 
below. First we present another class of procedural models. This is a rather intuitive class 
which contains models not equivalent to any CARGEN/i?MC model. 

Example 4.6 ('Uniform noise modelj Let X be a W -valued random variable. Let G = 
Ni, Hi, N2,H2, . . ., where the iVj are {0, l]-valued, and the Hi are W -valued with 

P{Hi = x) = l/ \ W\ {xeW). (27) 

Let X, Ni, Hi, . . . be independent. Define for hi € W,ni € {0, !}.■ 

f{x, {m, hi, ...)) = {x} U{hi\i:ni = 1}. (28) 

This model describes a procedure where in several steps (perhaps infinitely many) uniformly 
selected states from W are added as noise to the observation. The random variables Ni 
represent events that cause additional noise to be added. The distributions generated by this 
procedure are d-car, because for all x, U with x G U: 

P{Y = U\X = x) = P{{hi \ i:ni = l} = U)-\- P{{hi \i:ni = l} = U\ {x}). 

By the uniformity condition (27), and the independence of the family {X, Ni, Hi, . . .}, the 
last probability term in this equation is constant for x E U. 

The uniform noise model can not generate exactly the d-car distribution of Example 2. 1 7. 
However, it can generate the variant of that distribution that was originally given in GvLR. 

The uniform noise model is rather specialized, and far from being able to induce every 
possible d-car distribution. As mentioned above, GH have proposed a procedure called 
CARgen* for generating exactly all d-car distributions. This procedure is described in GH 
in the form of a randomized algorithm, but it can easily be recast in the form of a procedural 
model in the sense of Definition 4.1. We shall not pursue this in detail, however, and instead 
present a procedure that has the same essential properties as CARgen* (especially with 
regard to the formal "reasonableness conditions" we shall introduce below), but is somewhat 
simpler and perhaps slightly more intuitive. 
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Example 4.7 (T'ropose and test model, P&T) ) Let X he a, W -valued random variable. Let 
G = Gi, G2, . . . be an infinite sequence of random variables taking values in 2^ \ 0. Let 
X,Gi,G2, ■ ■ ■ be independent, and the Gi be identically distributed, such that 



^ P{Gi = U) is constant on {xeW\ P{X = x) > 0}. (29) 

U:xeU 



Define 



f{x,iUi,U2,...) := 



Ui if i = min{j > 1 | a; € C/j } 
W if {j >l\x eUj} = $ 



The P&T model describes a procedure where we randomly propose a set U C W, test 
whether x E U, and return U if the result is positive (else continue). The condition (29) 
can be understood as an unbiasedness condition, which ensures that for every x E W (with 
P{X = x) > 0) we are equally likely to draw a positive test for x. The following theorem 
is analogous to Theorem 4.9 in GH; the proof is much simpler, however. 

Theorem 4.8 A coarse data distribution P is d-car iff it can be induced by a P&T model. 

Proof: That every distribution induced by a P&T model is d-car follows immediately from 

P{Y = U\X = x) = P{Gi = U)/ P{Gi = U'). (30) 

U':xeU' 

By (29) this is constant oil {x eU \ P{X = x) > 0} (note, too, that (29) ensures that the 
sum in the denominator of (30) is nonzero for all x, and that in the definition of / the case 
{j > 1 \ X € Uj} = ^ only occurs with probability zero). 

Conversely, let P be a d-car distribution on il{W). Define c := Ylue2^ = U \ X e 
U), and 

p(Gi = U) = P{Y = U\Xe U)/c. 

Since P{Y = U\X eU) = P{Y = U \ X = x) ior x with P{X = x) > 0, we have 
Y.u:xeu = U \ X eU) = I iov a\\ X eW with P{X = x) > 0. It follows that (29) is 
satisfied with 1/c being the constant. The resulting P&T model induces the original P: 

P(/(X, G) = U\X = x) = {P{Y = U\XE U)/c)/{ P{y = U'\XE U')/c) 

U':xeU' 

= P{Y = U\X eU) = P{Y = U\X = x). 

□ 

The P&T model looks like a reasonable natural procedure. However, it violates a desider- 
atum that GvLR have put forward for a natural coarsening procedure: 

(D) \n the coarsening procedure, no more information about the true value 
of X should be used than is finally revealed by the coarse data variable Y (Gill 
et al., 1997, p. 266, paraphrased). 
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The P&T model violates desideratum (D), because when we first unsuccessfully test Ui, . . . , 
Uk, then we require the information x ^ U^^^t/j, which is not included in the final data 
Y = Uk+i- The observation generating process of Example 2.17, too, appears to violate 
(D), as the host requires the precise value of X when following his strategy. Finally, the 
uniform noise model violates (D), because in the computation (28) of the final coarse data 
output the exact value of X is required. These examples suggest that (D) is not a condition 
that one must necessarily expect every natural coarsening procedure to possess. (D) is 
most appropriate when coarse data is generated by an experimental process that is aimed 
at determining the true value of X, but may be unable to do so precisely. In such a scenario, 
(D) corresponds to the assumption that all information about the value of X that is collected 
in the experimental process also is reported in the final result. Apart from experimental 
procedures, also 'accidental' processes corrupting complete data can generate d-car data (as 
represented, e.g., by the uniform noise model). For such procedures (D) is not immediately 
seen as a necessary feature. However, Theorem 4.17 below will lend additional support to 
(D) also in these cases. 

GH argue that their class of CARgen* procedures only contains reasonable processes, 
because "each step of the algorithm can depend only on information available to the exper- 
imenter, where the 'information' is encoded in the observations made by the experimenter 
in the course of running the algorithm" ( Gil, p. 260). The same can be said about the 
P&T procedure. The direct car model would not be reasonable in this sense, because for 
the simulation of the variable G one would need to pick a distribution dependent on the 
true value of X, which is not assumed to be available. However, it is hard to make rigorous 
this distinction between direct car models on the one hand, and CARgen* /P&T on the 
other hand, because the latter procedures permit tests for the value of X (through checking 
X G [/ for test sets U - using singleton sets U one can even query the exact value of X), 
and the continuation of the simulation is dependent on the outcome of these tests. 

We will now establish a more solid foundation for discussing reasonable vs. unreason- 
able coarsening procedures by introducing two different rigorous conditions for natural or 
reasonable car procedures. One is a formalization of desideratum (D), while the other 
expresses an invariance of the car property under numerical parameter changes. We will 
then show that these conditions can only be satisfied when the generated distribution is 
d-ccar. For the purpose of this analysis it is helpful to restrict attention to a special type 
of procedural models. 

Definition 4.9 A procedural model {X,G,f) is a Bernoulli-model if the family X,Gi, 
G2, ■ ■ ■ is independent. 

The name Bernoulli model is not quite appropriate here, because the variables X, Gi 
axe not necessarily binary. However, it is clear that one could also replace the multinomial 
X and Gi with suitable sets of (independent) binary random variables. In essence, then, 
a Bernoulli model in the sense of Definition 4.9 can be seen as an infinite sequence of 
independent coin tosses (with coins of varying bias). Focusing on Bernoulli models is no 
real limitation: 

Theorem 4.10 Let {X,G,f) be a procedural model. Then there exists a Bernoulli model 
{X,G* , f*) inducing the same coarse data distribution. 
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The reader may notice that the statement of Theorem 4.10 really is quite trivial: the 
coarse data distribution induced by {X,G,f) is just a distribution on the finite coarse 
data space Q,{W), and there are many simple, direct constructions of Bernoulli models for 
such a given distribution. The significance of Theorem 4.10, therefore, lies essentially in 
the following proof, where we construct a Bernoulli model {X,G* , f*) that preserves all 
the essential procedural characteristics of the original model {X,G,f). In fact, the model 
{X,G* , f*) can be understood as an implementation of the procedure {X,G,f) using a 
generator for independent random numbers. 

To understand the intuition of the construction, consider a randomized algorithm for 
simulating the procedural model {X,G,f). The algorithm successively samples values for 
X,Gi,G2, ■ ■ and finally computes / (for most natural procedural models the value of / 
is already determined by finitely many initial Gj-values, so that not infinitely many Gi 
need be sampled, and the algorithm actually terminates; for our considerations, however, 
algorithms taking infinite time pose no conceptual difficulties). The distribution used for 
sampling Gi may depend on the values of previously sampled Gi, . . . ,Gi-i, which, in a 
computer implementation of the algorithm are encoded in the current program state. 

The set of all possible runs of the algorithm can be represented as a tree, where branch- 
ing nodes correspond to sampling steps for the Gj. A single execution of the algorithm 
generates one branch in this tree. One can now construct an equivalent algorithm that, 
instead, generates the whole tree breadth- first, and that labels each branching node with 
a random value for the Gi associated with the node, sampled according to the distribution 
determined by the program state corresponding to that node. In this algorithm, sampling 
of random values is independent. The labeling of all branching nodes identifies a unique 
branch in the tree, and for each branch, the probability of being identified by the labeling is 
equal to the probability of this branch representing the execution of the original algorithm 
(a similar transformation by pre-computing all random choices that might become relevant 
is described in by Gill & J.M.Robins, 2001[Section 7]). The following proof formalizes the 
preceding informal description. 

Proof of Theorem 4.10: For each random variable Gi we introduce a sequence of random 
variables G*^,..., G* ^^^-y where K{i) =\W x x^^^Fj | is the size of the joint state space of 
X,Gi, . . . , Gi-i. The state space of the G*^ is Fj (with regard to our informal explanation, 
G* ^ corresponds to the node in the full computation tree that represent the sampling of Gi 
when the previous execution has resulted in the hih out of K{i) possible program states). 
We construct a joint distribution for X and the G^^ by setting P{G*y^ = v) = P{Gi = v \ 
{X, Gi, . . . , Gi-i) = Sh) {sh the hih state in an enumeration of x x*~?|^Fj), and by taking 
X and the G*^ to be independent. 

It is straightforward to define a mapping 

h* -.Wx Xi>iTf^^ ^ r 

such that {X^h*{X,G*)) is distributed as {X^G) (the mapping h* corresponds to the ex- 
traction of the "active" branch in the full labeled computation tree). Defining f*{x,g*) := 
f{x,h*{x,g*)) then completes the construction of the Bernoulli model. □ 
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Definition 4.11 The Bernoulli model (X, G*, /*) obtained via the construction of the proof 
of Theorem 4-10 is called the Bernoulli transform of {X, G, /). 

Example 4.12 For a direct car model {X, G, /) we obtain the Bernoulli transform {X, {G*, 
■■■,Gn),f*), where 

P{G* = U) = P{G = U\X = xi), 

h*{Xi,Ui,...,Un) = {Xi,Ui), 

and so f*{xi, Ui, . . . , Un) = Ui. 

When the coarsening procedure is a Bernoulli model, then no information about X is 
used for sampling the variables G. The only part of the procedure where X influences the 
outcome is in the final computation of y = f{X, G). The condition that in this computation 
only as much knowledge of X should be required as finally revealed by Y now is basically 
condition (9) for coarsening variables. The state space V for G now being (potentially) 
uncountable, it is however more appropriate to replace the universal quantification "for all 
g" in (9) with "for almost all g" in the probabilistic sense. We thus define: 

Definition 4.13 A Bernoulli model is honest, if for all x,x' with P{X = x) > 0,P{X = 
x') > 0, and all U e 2^ \ 0.- 

P(G G {g I fix, g) = U,x'eU^ fix', g) = U}) = l. (31) 

Example 4.14 The Bernoulli model of Example 4-12 is not honest, because one can have 
for some U\, . . . ,Un with P(G = (C/i, . . . , Un)) > 0: Uj Ui, and Xj, Xj G Ui, such that 

f*iXi, Ui,...,Un) = Ui^Uj = f*iXj, Ui,..., Un). 

Honest Bernoulli models certainly satisfy (D). On the other hand, there can be non- 
Bernoulli models that also seem to satisfy (D) (notably the RMC models, which were 
developed with (D) in mind). However, for non-Bernoulli models it appears hard to make 
precise the condition that the sampling of G does not depend on X beyond the fact that 
X £Y ^. The following theorem indicates that our formalization oi (D) in terms of Bernoulli 
models only is not too narrow. 

Theorem 4.15 The Bernoulli transforms of MOD, CARge^ and RMC models are honest. 

The proof for all three types of models are elementary, though partly tedious. We omit 
the details here. 

We now turn to a second condition for reasonable procedures. For this we observe that 
the MGD / C ARgen /KMC models are essentially defined in terms of the "mechanical pro- 
cedure" for generating the coarse data, whereas the direct car, the uniform noise, and the 
P&T models (and in a similar way CARgen*) rely on the numerical conditions (26), (27), 
respectively (29), on distributional parameters. These procedures, therefore, are fragile in 
the sense that slight perturbations of the parameters will destroy the d-car property of the 
induced distribution. We would like to distinguish robust car procedures as those for which 

1. The intuitive condition that G must be independent of X given Y turns out to be inadequate. 
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the d-car property is guaranteed through the mechanics of the process alone (as determined 
by the state spaces of the Gi, and the definition of /), and docs not depend on parameter 
constraints for the Gi (which, in a more or less subtle way, can be used to mimic the brute 
force condition (26)). Thus, we will essentially consider a car procedure to be robust, if it 
stays car under changes of the parameter settings for the Gj. There are two points to con- 
sider before we can state a formal definition of this idea. First, we observe that our concept 
of robustness should again be based on Bernoulli models, since in non-Bernoulli models even 
arbitrarily small parameter changes can create or destroy independence relations between 
the variables X,G, and such independence relations, arguably, reflect qualitative rather 
than merely quantitative aspects of the coarsening mechanism. 

Secondly, we will want to limit permissible parameter changes to those that do not lead to 
such drastic quantitative changes that outcomes with previously nonzero probability become 
zero-probability events, or vice versa. This is in line with our perspective in Section 3, 
where the set of support of a distribution on a finite state space was viewed as a basic 
qualitative property. In our current context we are dealing with distributions on uncountable 
state spaces, and we need to replace the notion of identical support with the notion of 
absolute continuity: recall that two distributions P, P on a state space S are called mutually 
absolutely continuous, written P = P, if P{S) = 44> P{S) = for all measurable S QT,. 

For a distribution P{G) on F, with G an independent family, we can obtain P{G) with 
P = P, for example, by changing for finitely many i parameter values P{Gi = g) = r > 
to new values P{Gi = g) = r > 0. On the other hand, if e.g. Fj = {0, 1}, P{Gi = 0) = 1/2, 
and P{Gi = 0) = 1/2 + e for all i and some e > 0, then P(F) ^ P(F). For a distribution 
P{X) of X alone one has P{X) = P{X) iff P and P have the same support. 

Definition 4.16 A Bernoulli model {X, G, /) is robust car ( robust ccar/, if it is car (ccarj, 
and remains car /ccarj if the distributions P{X) and P{Gi) (i > 1) are replaced with 
distributions P{X) and P{Gi), such that P{X) = P{X) and P{G) = P{G). 

The Bernoulli transforms of MGD/CARgen are robust ccar. Of the class RMC we 
know, so far, that it is car. The Bernoulli transform of RMC can be seen to be robust car. 
The Bernoulli transforms of CARgen*/P&T, on the other hand, are not robust (and neither 
is the uniform noise model, which already is Bernoulli). We now come to the main result of 
this section, which basically identifies the existence of 'reasonable' procedural models with 
d-ccar. 

Theorem 4.17 The following are equivalent for a distribution P on il(W): 

(i) P is induced by a robust car Bernoulli model. 

(ii) P is induced by a robust ccar Bernoulli model. 

(iii) P is induced by an honest Bernoulli model. 

(iv) P is d-ccar. 

The proof is given in Appendix A. Theorem 4.17 essentially identifies the existence of a 
natural procedural model for a d-car distribution with the property of being d-ccar, rather 
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than merely d-car. This is a somewhat surprising result at first sight, given that M-mcar 
is usually considered an unrealistically strong assumption as compared to M-mar. There 
is no real contradiction here, however, as we have seen before that d-ccar is weaker than 
M-mcar. Theorem 4.17 indicates that in practice one may find many cases where d-ccar 
holds, but M-mcar is not fulfilled. 

5. Conclusion 

We have reviewed several versions of car conditions. They differ with respect to their 

formulation, which can be in terms of a coarsening variable, or in terms of a purely distri- 
butional constraint. The different versions are mostly non-equivalent. Some care, therefore, 
is required in determining for a particular statistical or probabilistic inference problem the 
appropriate car condition that is both sufficient to justify the intended form of inference, 
and the assumption of which is warranted for the observational process at hand. We argue 
that the distributional forms of car are the more relevant ones: when the observations are 
fully described as subsets of W, then the coarse data distribution is all that is required 
in the analysis, and the introduction of an artificial coarsening variable G can skew the 
analysis. 

Our main goal was to provide characterizations of coarse data distributions that satisfy 
d-car. We considered two types of such characterizations: the first type is a "static" 
description of d-car distributions in terms of their sets of support. Here we have derived a 
quite intuitive, complete characterization by means of the support hypergraph of a coarse 
data distribution. 

The second type of characterizations is in terms of procedural models for the observa- 
tional process that generates the coarse data. We have considered several models for such 
observational processes, and found that the arguably most natural ones are exactly those 
that generate observations which are d-ccar, rather than only d-car. This is somewhat 
surprising at first, because M-ccar is typically an unrealistically strong assumption (cf. 
Example 2.3). The distributional form, d-ccar, on the contrary, turns out to be the perhaps 
most natural assumption. The strongest support support for the d-ccar assumption is pro- 
vided by the equivalence (i) 4^ {iv) in Theorem 4.17: assuming d-car, but not d-ccar, means 
that we must be dealing with a fragile coarsening mechanism that produces d-car data only 
by virtue of some specific parameter settings. Since we usually do not know very much 
about the coarsening mechanism, the assumption of such a special parameter-equilibrium 
(as exemplified by (29)) will typically be unwarranted. 
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Appendix A. Proof of Theorem 4.17 

Theorem 4.17 The following are equivalent for a distribution P on VtiW): 

(i) P is induced by a robust car Bernoulli model. 

(ii) P is induced by a robust ccar Bernoulli model. 

(iii) P is induced by an honest Bernoulli model. 

(iv) P is d-ccar. 

We begin with some measure theoretic preliminaries. Let A be the product a-algebra 
on r generated by the powersets 2^^. The joint distribution P{X, G) then is defined on the 
product of 2^ and A. The a-algebra A is generated by the cylinder sets {gi,g2, ■ ■ ■ ,5^) x 
^j>k^j ^ 0, 5^ E Tfi for h = 1, . . . ,k). The cylinder sets also are the basis for a topology 
O on r. The space (F, O) is compact (this can be seen directly, or by an application 
of Tikhonov's theorem). It follows that every probability distribution P on ,4 is regular, 
especially for all A E A: 

P{A) = inf{P(0) \ACO£0} 

(see e.g. Cohn, 1993, Prop. 7.2.3). Here and in the following we use interchangeably 
the notation P{A) and P(G € A). The former notation is sufficient for reasoning about 
probability distributions on A, the latter emphasizes the fact that we are always dealing 
with distributions induced by the family G of random variables. 

Lemma A.l Let P{G) be the joint distribution on A of an independent family G. Let 
Ai,A2 e A with ^1 n ^2 = and P{G G ^i) = P{G G A2) > 0. Then there exists a joint 
distribution P{G) with P{G) = P{G) and P{G G Ai) P{G G A2). 

Proof: Let p := P(Ai). Let e = p/2 and O G O such that Ai C O and P{0) < p + e. Using 
the disjointness of Ai and A2 one obtains P{Ai \ O) > P{A2 \ O). Since the cylinder sets are 
a basis for O, wc have O = Uj>o-^i for a countable family of cylinders Zi. It follows that also 
for some cylinder set Z = {gl^g^., ... ,gl)x 'Xj>k^j with P{Z) > 0: P{Ai \ Z) > P(^2 | Z). 
Now let (5 > and define for h = 1, . . . ,k: 

P{Gh = gl):=l-S; P{G^ = g):=S{P{Gh=g)/ Yl = 9')) {9 ^ 9l) 

For /i > + 1: P{Gh) := P{Gh). Then P{G) = P{G), P{Ai \ Z) = P{Ai \ Z), P{A2 \ 
Z) = P{A2 I Z), and therefore: 

P{Ai) > (1 - 5fP{Ar I Z), P{A2) < (1 - 5fP{A2 \Z) + !-{!- 5f. 

For sufficiently small 5 this gives P{Ai) > P{A2). □ 

Proof of Theorem 4.17: For simplification we may assume that P{x) > for all x G W. 
This is justified by the observation that none of the conditions (i)-(iv) are affected by adding 
or deleting states with zero probability from W. 
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The implication (iv)^(ii) follows from Example 4.4 by the observation that MGD models 
are robust d-ccar Bernoulli models. (ii)=^(i) is trivial. We will show (i)=^(iii) and (iii)^(iv). 

First assume (i), and let {X,G,f) be a robust car Bernoulli model inducing P. For 
X eW and U CW denote 

A{x,U) := {geT\f{x,g) = U}. 
The d-car property of P is equivalent to 

P{G e A{x, U)) = P{G G A{x', U)). (32) 

for all X, x' € U . 

Condition (31) is equivalent to the condition that P{G G A{x,U) \ A{x' ^ U)) = for 
x,x' G U. Assume otherwise. Then for Ai := A{x, U) \ A{x' , U), Ai := A{x', U) \ A{x, U): 
< P{G G Ai) = P{G G A2). Applying Lemma A.l we obtain a Bernoulli model 
P(X, G) = P{X)P{G) with P{X, G) = P{X, G) and P{G G ^i) 7^ P{G G A2). Then also 
P{G G A{x, U)) / P{G G A{x', U)), so that P{X, G) is not d-car, contradicting (i). 

(ih)=^(iv): Let 

r* := n fix, g) = U,x'GU^ fix', g) = U}. 

x,x'&W,UCW: 
x,x'€U 

Since the intersection is only over finitely many x, x', U, we obtain from (iii) that P(G G 
r*) = 1. For U C W define A(C/) := Aix,U) n T*, where x E U is arbitrary. By the 
definition of T* the definition of AiU) is independent of the particular choice of x. Define 
an equivalence relation ~ on T* via 

g^g' ^ yU^W: ge A{U) ^ g' e A{U). (33) 

This equivalence relation partitions F* into finitely many equivalence classes FJ, . . . , T^. 
We show that for each F* and G F* the system 

Wi:={U\3xeW: fix,g) = U} (34) 

is a partition of W, and that the definition of Wj does not depend on the choice of g. The 
latter claim is immediate from the fact that for gr G F* 

fix,g) = U ^ g e AiU) and X e U. (35) 

For the first claim assume that fix,g) = U,fix',g) = U' with U 7^ U' . In particular, 

g G AiU) n AiU'). Assume there exists x" G [/ n U' . Then by (35) we would obtain 
both fix",g) = U and f{x",g) = U' , a contradiction. Hence, the sets U in the Wj are 
pairwise disjoint. They also are a cover of W, because for every x € W there exists U with 
xeU = fix,g). 

We thus obtain that the given Bernoulli model is equivalent to the multiple grouped 
data model defined by the partitions Wj and parameters := P(G G F^). □ 
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